Finding truth even if the crowd is wrong
نویسندگان
چکیده
Over a hundred years ago Galton reported on the uncanny accuracy of the median estimate of the weight of an ox, as judged by spectators at a country fair [1]. Since then, the notion that the ‘wisdom of the crowd’ is superior to any individual has itself become a piece of crowd wisdom, raising expectations that web-based opinion aggregation might replace expert judgment as a source of policy guidance [2, 3]. However, distilling the best answer from diverse opinions is challenging when most people hold an incorrect view [4]. We propose a method based on a new definition of the best answer: it is the one given by respondents who would be least surprised by the true answer if it were revealed. Since this definition is of interest only when the true answer is unknown, algorithmic implementation is nontrivial. We solve this problem by asking respondents not only to answer the question, but also to predict the distribution of others’ answers. Previously, it was shown that this secondary information can be used to create incentives for honest responding [5]. Here we prove that this information can also be used to identify which answer is the best answer by our new definition. Unlike multi-item analysis [6, 7] or boosting [8], our method can be applied to a unique question. This capability is critical in knowledge domains that lack consensus about which historical precedents might establish experts’ relative track records. Unlike Bayesian models [9, 10, 11, 12, 13] our method does not require user-specified prior probabilities, nor does it require information sharing that might lead to “groupthink” [14]. An experiment demonstrates that the method outperforms algorithms based on democratic or confidence-weighted voting [15, 16, 17, 18, 19]. Imagine that you have no knowledge of U.S. geography, and are confronted with the question
منابع مشابه
دروغگویی به بیمار با انگیزهی خیرخواهانه
Telling the truth to patients is a key issue in medical ethics. Today, most physicians hold that truth-telling to patients is crucial, and that lying to patients or withholding information from them is not acceptable. It seems, however, that absolute and unconditional truth-telling is not always possible, and it may not be feasible to tell some patients certain truths under some circumstances. ...
متن کاملKATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing
Data cleaning with guaranteed reliability is hard to achieve without accessing external sources, since the truth is not necessarily discoverable from the data at hand. Furthermore, even in the presence of external sources, mainly knowledge bases and humans, effectively leveraging them still faces many challenges, such as aligning heterogeneous data sources and decomposing a complex task into si...
متن کاملExploring Relevance as Truth Criterion on the Web and Classifying Claims in Belief Levels
The Web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the Web. Moreover, different websites often provide conflicting information on a subject. Several truth discovery methods have been proposed for various scenarios, and they have been successfully applied in diverse application domains. In this paper...
متن کاملEpistemic Goals and Epistemic Values
The truth exerts a powerful attraction. Reading the newspaper over breakfast a few months ago, I came across the following quote from Ricky Williams, a running back for the Miami Dolphins who was in the process of walking away from his million-dollar salary to pursue a career in holistic medicine. ‘‘I’m going to search for the truth,’’ Williams said. ‘‘Everything I’m doing in my life is about f...
متن کاملRobust Crowd Labeling Using Little Expertise
Crowd-labeling emerged from the need to label large-scale and complex data, a tedious, expensive, and time-consuming task. But the problem of obtaining good quality labels from a crowd and their integration is still unresolved. To address this challenge, we propose a new framework that automatically combines and boosts bulk crowd labels supported by limited number of “ground truth” labels from ...
متن کامل